28 research outputs found

    INFORMATION RETRIEVAL USING LATENT SEMANTIC INDEXING

    Get PDF
    Our capabilities for collecting and storing data of all kinds are greater then ever. On the other side analyzing, summarizing and extracting information from this data is harder than ever. That’s why there is a growing need for the fast and efficient algorithms for information retrieval.In this paper we present some mathematical models based on linear algebra used to extract the relevant documents for some subjects out of a large set of text document. This is a typical problem of a search engine on the World Wide Web. We use vector space model, which is based on literal matching of terms in the documents and the queries. The vector space model is implemented by creating the term-document matrix. Literal matching of terms does not necessarily retrieve all relevant documents. Synonymy (multiple words having the same meaning) and polysemy (words having multiple meaning) are two major obstacles for efficient information retrieval. Latent Semantic Indexing represents documents by approximations and tends to cluster documents on similar topics even if their term profiles are somewhat different. This approximate representation is accomplished using a low-rank singular value decomposition (SVD) approximation of the term-document matrix. In this paper we compare the precision of information retrieval for different ranks of SVD representation of term-document matrix

    Analysis of Top 500 Central and East European Companies Net Income Using Benford\u27s Law

    Get PDF
    There are numerous useful methods that can be conducted in data analysis in order to check data correctness and authenticity. One of contemporary and efficient methods is application of so-called Benford\u27s Law. In this paper we examine ways of application of this law in investigation of specific net income number set. Our aim is to make a conclusion if this number set conforms to Benford\u27s Law. An examination target focus is set on values of top 500 central and east European companies according to their income. Data set contains 1,500 records and spans through 3 years (2007, 2008 and 2009) including 500 net incomes per year. Research is based on net income profit and loss subsets as well as absolute values of net income. Analysis covers first digit Benford\u27s Law test and proves conformance to Benford\u27s Law of all observed subsets

    PROBLEMI NEDOSTAJUĆIH PODATAKA U DISTRIBUCIJAMA VJEROJATNOSTI KOJE NISU GAUSSOVE

    Get PDF
    Abstract Ecology as a scientific discipline has been developing rapidly and becoming the interdisciplinary science based on Information and Communication Technologies (ICT). Discovering, integrating and analyzing a huge amount of heterogeneous data is crucial in exploring complex ecological issues. Ecoinformatics offers tools and approaches for the management of environmental data which it transforms further into information and knowledge. The development of Information Technologies with the special emphasis on the research methods of gathering and analyzing data, their storage and data access, has significantly enhanced the laboratory methods and their reports. The above, influences the data quality, as well as the research itself. Moreover, it provides a stable base for the development and the replacement of missing data. The improper missing data handling can lead to invalid conclusions. Therefore, it is important to use the adequate methods for handling the missing data. This paper compares The Deleting Rows Method (Listwise Deletion Method) and six single imputation methods, namely: Last Observation Carried Forward (LOCF), Hot-deck Imputation, Group Mean Imputation, Estimated Mean Value Imputation (Regression), Mode Imputation and Median Imputation. For the purposes of this study, the actual, empirical data was collected and used from the non- Gaussian probability distribution of the observed technical system. Mostly, these are asymmetric probability distributions with a tail. Data sets with missing data were created by deleting values with a random number generator. The experiment was repeated three times for each 100%, 95% and 75% sets of the collected data. Experiments have shown that the best imputation data results were provided by Hot-Deck Method, especially when there was a larger number of missing data, which has been confirmed by the Tests of Goodness. The same results, regardless of the set size, were provided by Listwise Deletion Method, which is simpler.Sažetak Ekologija kao znanstvena disciplina brzo se razvija i postaje interdisciplinarna znanost koja se temelji na informacijsko komunikacijskim tehnologijama (IKT). Otkrivanje, integriranje i analiza ogromnih količina heterogenih podataka je ključno u istraživanju složenih ekoloških pitanja. Ekoinformatika nudi alate i pristupe za upravljanje okolišnim pokazateljima i pretvara ih u informacije i znanje. Razvoj informacijskih tehnologija s posebnim naglaskom na metode istraživanja prikupljanja i analizu podataka, njihovu pohranu i pristup podacima znatno poboljšava laboratorijske metode i njihova izvješća. Sve to utječe na kvalitetu podataka, uključujući istraživanja i pruža stabilnu bazu za njihov razvoj i zamjenu podataka koji nedostaju. Nepravilno rukovanje s „nedostajućim podacima“ može dovesti do pogrešnih zaključaka. Dakle, važno je koristiti odgovarajuće metode za upravljanje podacima koji nedostaju. U ovom radu će se usporediti metoda brisanja reda te šest metoda jednostruke metode imputacije: metoda posljednjeg provedenog promatranja, metoda Hot-deck imputacije, metoda imputacije srednje vrijednosti grupe, metoda imputacije procijenjene srednje vrijednosti (regresija), metoda imputacije moda i metoda imputacije medijana. Za potrebe ovog istraživanja, prikupljeni su empirijski podaci tehničkog sustava kod kojih se podaci ne raspoređuju prema Gaussovim distribucijama vjerojatnosti. Uglavnom su to asimetrične distribucije s repom. Skupovi s nedostajućim podacima stvoreni su brisanjem vrijednosti koristeći generator slučajnih brojeva. Eksperiment je ponovljen tri puta za svaku ispitivanu varijablu nad skupovima od: 100%, 95% i 75% prikupljenih podataka. Eksperimenti su pokazali da je najbolje rezultate imputacije podataka dala Hot-deck metoda, naročito kad nedostaje veći broj podataka što su potvrdili i testovi slaganja. Iznenađujuće je to da skoro jednako dobre rezultate, neovisno o veličini skupa, daje metoda brisanja redaka koja je puno jednostavnija

    Research of Household Expenditure for Food and Non-Alcoholic Beverages in the Republic of Croatia

    Get PDF
    Cilj je ovoga rada istraživanje potrošnje kućanstava po dohodovnim decilima. Razmatra se, od svih kategorija izdataka samo najznačajnija, hrana i bezalkoholna pića. Istraživanja i analize temeljeni su na rezultatima Ankete o potrošnji kućanstava u Republici Hrvatskoj. Postavljeni su odgovarajući matematičko- statistički modeli potrošnje za hranu i bezalkoholna pića po dohodovnim decilima. Definirani modeli korišteni su u daljnjim istraživanjima za izračunavanje koeficijenata elastičnosti. Istraživanja su pokazala da su izdaci za kategoriju hranu i bezalkoholno piće neelastični, čime je potvrđen prvi Engelov zakon. Dobiveni rezultati mogu se koristiti u planiranju potrošnje kućanstava i za buduća razdoblja s obzirom na činjenicu da postoji model potrošnje po dohodovnim decilima koji se odnosi na razdoblje od 2000. do 2009. godine. Također je konstruiran model mjerenja elastičnosti koji se odnosi na 10-godišnje razdoblje i koji može poslužiti u predviđanjima budućih koeficijenata elastičnosti.The aim of this paper is to investigate household spending by income deciles. Only the most important one among the expenditure categories was considered, food and non-alcoholic beverages. Research and analysis were based on the results of the Questionnaire on Household Expenditure in the Republic of Croatia. Adequate mathematical and statistical models of expenditure for food and non-alcoholic beverages by income deciles were established. The defined models were used in further research to calculate the coefficient of elasticity. The research showed that expenditure for food and non-alcoholic beverages is non-elastic, thus confirming the first Engel’s law. The obtained results can be used in planning household expenditure also in future periods, considering the fact that the model of expenditure by income deciles referring to the period 200 – 2009 was developed. A model for measuring elasticity was constructed as well. It refers to a 10-year period and can be used to forecast future coefficients of elasticity

    PROSPECTS FOR AUTOMATED RELATIONSHIP MARKETING AND CUSTOMER RELATIONSHIP MANAGEMENT VIA THE INTERNET IN CROATIA

    Get PDF
    The Internet provides the means for diverse types of automated relationship marketing (RM) and customer relationship management (CRM) activities. In this paper, various RM activities and the potential for CRM via the Internet are discussed and analyzed in relation to recent research of the Web sites of small and medium-sized enterprises in Croatia. Finally, an outline is given of Internet-related RM activities that do not require large investment and that can be included in the e-marketing strategy of Croatian firms

    Analysis of variables that influence microbiological quality in fresh cheese production

    Get PDF
    U radu su ispitivane varijable koje utječu na mikrobiološku kvalitetu u proizvodnji svježeg sira s obzirom na kvasce i plijesni. Kvasci i plijesni ne preživljavaju pasterizaciju, a izolirani iz gotovog proizvoda posljedica su naknadne kontaminacije tijekom fermentacije mlijeka i izdvajanja sirutke. Stoga su u ovom radu mjerene sljedeće varijable: ukupni broj aerobnih mezofilnih bakterija u ispirnoj vodi, ukupni broj kvasaca i plijesni u ispirnoj vodi, ukupni broj aerobnih mezofilnih bakterija u pasteriziranom mlijeku za sirenje, ukupni broj aerobnih mezofilnih bakterija u zraku i ukupni broj kvasaca i plijesni u zraku. Navedene varijable analizirane su pomoću statističkih metoda deskriptivne statistike i metode glavnih komponenata (PCA), te metode umjetne inteligencije, stabla odlučivanja. Rezultati metode glavnih komponenata, s obzirom na broj kvasaca i plijesni u gramu svježeg sira, pokazali su da najveći utjecaj na kvalitetu sira ima mikrobiološka čistoća zraka. Pomoću metode stabla odlučivanja dobiveni su interni standardi za ukupan broj aerobnih mezofilnih bakterija te kvasaca i plijesni u zraku koji utječu na mikrobiološku kvalitetu svježeg sira.In this research variables which influence microbiological quality in fresh cheese productionwith respect to yeasts and moulds were analyzed. Since yeasts and moulds do not survive the process of pasteurization their isolation in the final product is a result of their supplemental contamination during fermentation and whey drainage. Following variables were monitored: total number of aerobic mesophilic bacteria AMB in rinsing water, total number of yeasts and moulds in rinsing water, total number of AMB in pasteurized milk, total number of AMB in air and total number of yeasts and moulds in air. These variables were analysed by methods of descriptive statistics, principal component analysis, and the method of artificial intelligence - decision tree. By the method of principal component analysis it is shown that the greatest impact on the quality of the final product have variables connected to microbiological cleanness of air. The method of decision tree has resulted in determination of new internal standards for total number of AMB and yeasts and moulds in the air

    Analysis of variables that influence microbiological quality in fresh cheese production

    Get PDF
    U radu su ispitivane varijable koje utječu na mikrobiološku kvalitetu u proizvodnji svježeg sira s obzirom na kvasce i plijesni. Kvasci i plijesni ne preživljavaju pasterizaciju, a izolirani iz gotovog proizvoda posljedica su naknadne kontaminacije tijekom fermentacije mlijeka i izdvajanja sirutke. Stoga su u ovom radu mjerene sljedeće varijable: ukupni broj aerobnih mezofilnih bakterija u ispirnoj vodi, ukupni broj kvasaca i plijesni u ispirnoj vodi, ukupni broj aerobnih mezofilnih bakterija u pasteriziranom mlijeku za sirenje, ukupni broj aerobnih mezofilnih bakterija u zraku i ukupni broj kvasaca i plijesni u zraku. Navedene varijable analizirane su pomoću statističkih metoda deskriptivne statistike i metode glavnih komponenata (PCA), te metode umjetne inteligencije, stabla odlučivanja. Rezultati metode glavnih komponenata, s obzirom na broj kvasaca i plijesni u gramu svježeg sira, pokazali su da najveći utjecaj na kvalitetu sira ima mikrobiološka čistoća zraka. Pomoću metode stabla odlučivanja dobiveni su interni standardi za ukupan broj aerobnih mezofilnih bakterija te kvasaca i plijesni u zraku koji utječu na mikrobiološku kvalitetu svježeg sira.In this research variables which influence microbiological quality in fresh cheese productionwith respect to yeasts and moulds were analyzed. Since yeasts and moulds do not survive the process of pasteurization their isolation in the final product is a result of their supplemental contamination during fermentation and whey drainage. Following variables were monitored: total number of aerobic mesophilic bacteria AMB in rinsing water, total number of yeasts and moulds in rinsing water, total number of AMB in pasteurized milk, total number of AMB in air and total number of yeasts and moulds in air. These variables were analysed by methods of descriptive statistics, principal component analysis, and the method of artificial intelligence - decision tree. By the method of principal component analysis it is shown that the greatest impact on the quality of the final product have variables connected to microbiological cleanness of air. The method of decision tree has resulted in determination of new internal standards for total number of AMB and yeasts and moulds in the air

    Nail Position has an Influence on Anterior Knee Pain after Tibial Intramedullary Nailing

    Get PDF
    Our aim was to determine the possible relationship between anterior knee pain (AKP) and nail position marked as a distance from tip of nail to tibial plateau (NP) and to the tuberositas tibiae (NT). Nail position has an influence on anterior knee pain after tibial intramedullary nailing. We evaluated postoperative outcome results of 50 patients in the last 3 years with healed fractures initially treated with intramedullary (IM) reamed nails with 2 or 3 interlocking screws on both parts of the nail and with the use of medial paratendinous incision for nail entry portal. Patients marked a point on the visual analog scale (VAS) that corresponded to the level of postoperative AKP felt. Two groups of patients were formed on the basis of AKP (pain level was neglected): groups A and B, with and without pain, respectively. The difference between the two groups concerning NP measurements was statistically significant (p<0.05), but not concerning NT measurements at the p<0.05 level. Patients were classified by pain with high accuracy (98%) according to a classification tree. Symptoms of AKP did not appear if the tip of the nail position was more than 6.0 mm from the NP and more than 2.6 mm from the NT. However, for better evaluation of these results it will be necessary to examine a larger number of postoperative patients with AKP

    INFLUENCE OF BIOLOGICAL THERAPEUTICS ON PATIENT-REPORTED QUALITY-OF-LIFE OUTCOMES (WHOQOL-BREF), FUNCTIONAL SCORES AND DISEASE ACTIVITY AMONG CROATIAN PATIENTS WITH RHEUMATOID ARTHRITIS: OUR EXPERIENCE

    Get PDF
    Background: Rheumatoid arthritis (RA) is a chronic and disabling disease with a great impact on the quality of life (QOL). The aim of this study was to assess QOL and health in RA patients treated with biological disease-modifying drugs (bDMARDs) as opposed to those treated with conventional synthetic DMARDs (csDMARDs). We analysed four domains of QOL: physical health (D1), mental health (D2), social relationships (D3) and one\u27s surroundings (D4); as well as general quality of life (W1), general state of health (W2), and disease activity and physical disability. Subjects and methods: Seventy-seven RA patients (group A=29 on bDMARDs, group B= 48 on csDMARDs) were enrolled in the study. QOL was evaluated using WHO questionnaire (WHOQOL-BREF), disease activity using Disease ActivityScore28C-reactive protein (DAS28CRP) and functional status using Health Assessment Questionnaire (HAQ). Results: There was no statistically significant difference of mean values in the four domains of QOL, nor in the general QOL, between groups A and B. There was also no statistically significant difference regarding RA activity (3.51 vrs 3.54, p=0.56). However, we have found that the variable of the general state of health domain was statistically significantly higher in group B (2.66 vrs 2.89, p=0.001), while HAQ was statistically significantly higher in group A (1.19 vrs 1.07, p=0.018), as well as the duration of RA (6.25vrs 3.75 years, p=0.0006). Statistically significant correlation was found between HAQ and W2, disease duration and D3 in group A and DAS28CRP and D1, D2, W2 and HAQ and D1 and D2 in group B. Conclusion: These findings suggest that the inclusion of bDMARDs in the treatment regimen was overdue, with RA already advancing with developed functional disability, which prevented the achievement of the primary goals of treatment: low disease activity or remission and the improvement of patient\u27s QOL

    Nail Position has an Influence on Anterior Knee Pain after Tibial Intramedullary Nailing

    Get PDF
    Our aim was to determine the possible relationship between anterior knee pain (AKP) and nail position marked as a distance from tip of nail to tibial plateau (NP) and to the tuberositas tibiae (NT). Nail position has an influence on anterior knee pain after tibial intramedullary nailing. We evaluated postoperative outcome results of 50 patients in the last 3 years with healed fractures initially treated with intramedullary (IM) reamed nails with 2 or 3 interlocking screws on both parts of the nail and with the use of medial paratendinous incision for nail entry portal. Patients marked a point on the visual analog scale (VAS) that corresponded to the level of postoperative AKP felt. Two groups of patients were formed on the basis of AKP (pain level was neglected): groups A and B, with and without pain, respectively. The difference between the two groups concerning NP measurements was statistically significant (p<0.05), but not concerning NT measurements at the p<0.05 level. Patients were classified by pain with high accuracy (98%) according to a classification tree. Symptoms of AKP did not appear if the tip of the nail position was more than 6.0 mm from the NP and more than 2.6 mm from the NT. However, for better evaluation of these results it will be necessary to examine a larger number of postoperative patients with AKP
    corecore